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In STOC 1999, Raz presented a (partial) function for which there is a quantum protocol 
communicating only 0(log ft) qubits, but for which any classical (randomized, bounded-error) 
protocol requires poly(n) bits of communication. That quantum protocol requires two rounds 
of communication. Ever since Raz's paper it was open whether the same exponential sepa- 
t— I ration can be achieved with a quantum protocol that uses only one round of communication. 

Here we settle this question in the affirmative. 
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1 Introduction 

Communication complexity is one of the most basic models in computational complexity, with 
i wide-ranging applications in computer science [KN97[. The typical question asked in this model 

is the following. Two remote players, call them Alice and Bob, are each given an input and are try- 
ing to compute some function of their inputs while using as little communication as possible. How 
much communication is needed in order to compute the function? The answer to this question 
often depends on what exactly we mean by "compute using as little communication as possible." 
One of the central models in this area is that of randomized (bounded-error) communication. Here we 
allow the players to toss coins, and require them to output the correct answer with probability 
at least (say) 2/3 on any given input. This model is quite powerful and corresponds quite well 
to what is actually achievable in real-world communication. For instance, one of the most basic 
results in this area shows that the players can decide if their inputs are equal using only 0(log n) 
bits of communication, where n is the size of their inputs in bits. Another well-established model 
of communication is that of quantum communication [Yao93j . Here, we allow the players to com- 
municate quantum states, and to perform quantum operations on them. Although not nearly as 
common as classical (i.e., non-quantum) communication, this model is able to provide important 
insights into the power of quantum mechanics. 
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The focus of our work is on the relative power of these two central models, a question whose 
study started in the late 1990s IBCW98llAST + 98l . Most notably, Raz IIRaz99l presented a (partial) 
function for which there is a quantum protocol communicating only O (log ft) qubits, but for which 
any classical (randomized, bounded-error) protocol requires poly(ft) bits of communication (i.e., 
Q(n c ) for some constant c > 0). This result demonstrates that quantum communication is ex- 
ponentially stronger than classical communication, and is one of the most fundamental results in 
quantum communication complexity. 

However, although Raz's function can be computed using only O (log ft) qubits, it seems to 
require at least two rounds of communication between Alice and Bob. This naturally leads to the 
following fundamental question, which has been open ever since Raz's paper: can a similar expo- 
nential separation be achieved with a quantum protocol that uses only one round of communication! 
In other words: 

Can quantum one-way communication be exponentially stronger than classical two-way com- 
munication in computing a function! 

Such a result might be the strongest possible separation between quantum communication and 
classical communication. 

There have been quite a few partial results in this direction. First, Bar-Yossef, Jayram, and 
Kerenidis |BJK04] presented a relational problem (i.e., one in which there is possibly more than 



one correct answer for a given input) that has a quantum one-way protocol using only O (log ft) 
qubits of communication, but for which any classical protocol using only one round of communica- 
tion must communicate poly(n) bits. Classical two-way protocols, however, can easily solve their 
problem using O (log ft) bits. Their result was improved by Gavinsky Kempe, Kerenidis, Raz, 
and de Wolf [G KK + 07l who proved the same separation, namely, 0(log ft) qubit protocol versus a 
poly(ft) lower bound for any classical one-way protocol, but in the standard setting of afunctional 
problem. Again, classical two-way protocols can easily solve the problem using only O (log ft) bits. 
See also IMo nlOII for a similar separation. Another closely related result is by Gavinsky [Gav08 |, 
who improved on Bar-Yossef et al.'s | BJK04 1 result in the other direction: namely, he showed an 



exponential separation between one-way quantum communication and two-way classical communi- 
cation (just as in the open question) but for a relational problem. Gavinsky's proof is quite involved, 
and it is not clear if his techniques can be used to attack the functional case. 

It is important to note that there is a big difference between relational separations and func- 
tional ones, with the latter often being more interesting, involving deeper ideas, and having more 
profound implications. Indeed, the functional separation in IG KK + 07| required the use of a hy- 
percontractive inequality and also provided a surprising counterexample to a conjecture regarding 
extractors that are secure against quantum adversaries. Moreover, the existence of a relational sep- 
aration often says little about the existence of a functional one; for instance, there are cases where 
relational separations provably have no functional counterpart [GRW08J. 

Here we settle the open question by exhibiting a (partial) function for which there exists a 
quantum one-way communication protocol using only 0(log ft) qubits, but for which any classi- 
cal two-way communication protocol must communicate at least poly(ft) bits. The function we 
consider is actually the complete problem for one-way quantum communication [Kre95] and was 
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also described in IIRaz99l . We call it the Vector in Subspace Problem (VSP). In this problem, Alice is 
given an n-dimensional unit vector u G S n_1 and Bob is given a subspace H C W of dimension 
n/2 with the promise that either u G H or u G H . Their goal is to decide which is the case. (For a 
formal definition see Section |4j) The quantum protocol for the problem is almost immediate from 
the definition: Alice encodes the vector u as a quantum state of [log 2 n] qubits (by definition, the 
state of a quantum system with k qubits is a Z^-dimensional unit vector) and sends it to Bob, who, 
after having received the quantum state, performs the projective measurement given by (H, H^). 
If u G H, Bob is guaranteed to obtain the former outcome; if u G H , Bob is guaranteed to obtain 
the latter outcome. 

It is easy to see that VSP has a classical protocol using 0(n log n) bits: Alice simply sends 
the vector u to Bob, by specifying each coordinate to within an additive ±l/poly(n) accuracy. 
As noted by Raz [Raz99J, this protocol is not optimal, and the problem actually has an 0(\/n) 
protocol, which we will describe in Section |4j 

But of course, our focus in this paper is on lower bounds. Our main result is an Q(n 1/ ' 3 ) lower 
bound on the (classical) communication complexity of VSP. Previously no lower bound better 
than logarithmic was known. Our proof involves some techniques that seem novel in the com- 
puter science literature. We use a hypercontractive inequality, applied in a fashion similar to 
that in Kahn, Kalai, and Linial ||KKL 88 1 and in other more recent papers, including the result by 
Gavinsky et al. mentioned above |GKK + 07| (see also [W0IO8J). However, unlike previous work, 
our hypercontractive inequality is in the setting of functions defined on the sphere. We also use 
the Radon transform and some of its basic properties, as well as a rather delicate martingale ar- 
gument. Finally, we feel that the proof, at least at a very high level, is conceptually simpler than 
some of the previous proofs in this line of work. We hope that our result and techniques will find 
other applications. 

One obvious open question left by our work is to improve the lower bound to a tight Q(n 1//2 ); 
we will mention one possible approach below. Another open question is to strengthen our result 
by showing a separation between the quantum simultaneous message passing (SMP) model and 
the classical two-way model. This question was recently answered by Gavinsky [Gav09[ for rela- 
tional problems, but the question for functions seems quite challenging, and it is not even clear if 
such a separation can exist. A final important open question is to understand the power of quan- 
tum communication in computing total functions; so far the best known separation is polynomial. 



2 Proof Sketch 

Here we give an informal sketch of the main ideas in the proof of our lower bound, and include 
some remarks regarding the tightness and other aspects of our proofs. The proof starts in Section]!] 
with a more or less standard application of the rectangle bound which we do not describe here. 
This shows that in order to prove our communication lower bound, it suffices to prove the fol- 
lowing sampling statement, which is our main technical theorem (see Figure[l]for an illustration). 
The formal statement will appear as Theorem |6.1| 

Theorem 2.1 (Informal). Let A be an arbitrary (measurable) subset of the sphere S' I_1 whose measure 
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Figure 1: A subset of S 2 and an equator 



o~(A) (under the uniform probability measure on S 11 ^ 1 ) is at least exp(— n 1/3 ). Assume we choose a uni- 
formly random subspace H C W of dimension n/2. Consider the measure of the set A n H under the 
uniform probability measure on the unit sphere H n S" _1 of the n / 2-dimensional subspace H. Then this 
measure is within a factor of (say) 1 ± 0.1 ofcr(A) except with probability at most exp(— n 1/3 ). 

Before we proceed to discuss the proof of this theorem, we make two remarks. First, it is in- 
teresting to note that this theorem is a considerable strengthening of Lemma 4.1 in [Raz99J, which 
is a similar sampling statement, but one that applies only to sets A whose measure is constant (or 
slightly less). Raz proves that lemma using an elementary (but clever) use of Chernoff's concen- 
tration bound. See also the paper by Milman and Wagner [MW03J for a further discussion and 
applications of Raz's sampling lemma. 

The second remark is that our theorem is tight in the sense that there exists a set A of mea- 
sure exp(— n 1 ' 3 ) such that the probability of the measure of An H deviating by more than 10% 
is essentially exp(— n 1/3 ). This set A is simply a spherical cap, and the bad H's are those that are 
close to the center of the cap. We omit this standard calculation. One implication of this is that 
improving our Q(n 1 ^ 3 ) lower bound to a tight Q(n 1//2 ) is probably impossible using the rectangle 
bound, and one might have to use instead the smooth rectangle bound introduced in IKlaTol |JK10| 
and used recently in [CRIOj. For the interested reader, we note that the following reasonable sam- 
pling statement would imply the tight Q(n 1 ^ 2 ) bound. Let A be an arbitrary subset of the sphere 
S n_1 whose measure cr(A) is at least exp(— n 1//2 ), and assume we choose a uniformly random 
subspace H C R n of dimension n/2. We now consider the measure of the set Ad H and that of 
the set A n H (under the appropriate uniform probability measures). Then the goal would be to 
prove that the average of these two measures is at least 0.9 cr{A) except with probability at most 
exp(— n 1/2 ). 

Theorem 2.1 is proven by a recursive application of the following core sampling statement for 
(n — 1) -dimensional subspaces. Roughly speaking, it shows that sampling a set of measure at least 
exp(— n 1,/3 ) using a random (n — 1) -dimensional subspace gives an error that is typically at most 
1 ± n~ 2 ^ 3 and has an exponential decay. The formal statement will appear as Theorem 



5.1 



Theorem 2.2 (Informal). Let A C S n be of measure at least exp(— n ' ). Assume we choose a uni- 
formly random subspace H C R" of dimension n — 1. Then, for any < t < 1, the measure of An H 
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(under the uniform measure on HC\S n 1 ) is within a factor of 1 ± t of a (A) except with probability at 
most exp(— n 2/3 f). 

Section [6] will be dedicated to deriving Theorem |2.1| from the above theorem. This is done 
using a martingale argument and Bernstein-type inequalities; in the following we just give the 
rough idea. Consider the following equivalent way to choose a uniformly random subspace H of 
dimension n/2. First, let Ho = M". Then, choose a uniformly random subspace Hi C Ho = R" 
of dimension n — 1; then, choose a uniformly random subspace H2 of Hi of dimension n — 2; 
continue in the same fashion until H = H n / 2 which is a uniformly random n/2-dimensional 
subspace of H„/2-i- We now consider the sequence of measures of A H H; (with respect to the 
uniform measure in S" _1 (1 H;) for i = 0, . . . ,n/2. By definition, this sequence starts with o~(A). 



According to Theorem 2.2 at each step of the sequence we typically get an extra multiplicative 
error of 1 ± n~ 2//3 . After n/2 steps, the accumulated error becomes 1 ± i/n • n~ 2 ^ 3 = 1 ± n _1//6 
(this of course requires a proof since, e.g., the steps are not independent). Hence, assuming the 
error has a Gaussian tail (which is also far from obvious), and recalling that the probability that a 
Gaussian variable deviates by more than t standard deviations is roughly exp(— t 2 ), we obtain that 
the probability of seeing a total deviation of more than 1 ± 0.1 is at most exp(— n 1 ^ 3 ), as required. 



We remark that we also have an alternative and direct proof of Theorem 2.1 that is similar in 



nature to the proof of Theorem 2.2 (as described below), except it uses the Grassmannian manifold; 
this proof, unfortunately, currently leads to a worse bound of exp(— n 1//4 ) (instead of exp(— n 1//3 )) 
and is therefore omitted. It is quite possible that this direct proof can be improved to obtain the 
tight exp(— n 1/3 ) bound. 



The proof of Theorem 2.2 will be given in Section [5] It uses the hypercontractive inequality 



for the sphere, applied in a fashion similar to the one done by Kahn, Kalai, and Linial [KKL88J, 
as well as some basic properties of the Radon transform. In order to demonstrate these ideas in 
a setting that might be more familiar to some readers, we spend the remainder of this section on 
proving an analogous statement in the setting of the Boolean hypercube {0, 1}", and for simplicity 
just consider the case t = n~ 1//3 (the general case is similar). 



Sampling statement for the Boolean cube. Let n be an even integer. For a vector y G {0,1}" 
define y 1 - = {zG {0 / l} n ;HamDist(y,z) = n/2} as the "equator orthogonal to y" . Let A C {0,1}" 
be of measure ji( A) := \A\/2 n at least exp(—n 1//3 ). Assume we choose a uniform y G {0, l} n , and 
consider the fraction of points in y that are contained in A. Then our goal is to show that this 
fraction is in (1 ± n^ 1 ^ 3 )ji(A) except with probability at most exp(— n 1 ^ 3 ). 

As stated, this statement is actually false due to a parity issue; this can be seen, e.g., by taking 
A to be all points of even Hamming weight, a set of measure 1/2. Then the fraction of points in 
y that are contained in A is either or 1 depending on the parity of y. Although the statement 
can be easily mended, in the sequel we ignore this issue and proceed with an incomplete proof of 
the original incorrect statement. We allow ourselves to do this because this parity issue does not 
arise in the setting of the sphere, and the argument below becomes a valid proof there (with the 
necessary modifications, of course). 

The above sampling statement can be stated in the following essentially equivalent way. For 
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anyA, B C {0, 1}" of measure at least exp(— n 1/3 ), 

P [x E A] E (l±n- 1/3 )fi(A), (1) 

where the notation x ~ £ means that X is distributed uniformly in the set E, and the right hand 
side indicates the interval [(1 - n~ 1/3 )^(A), (1 + n~ 1/3 )/Y(A)]. For a function / : {0,1}" -4 R, 
define its Radon transform R(J) : {0, 1}" — > R as 

R(/)(y) :=E^ [/(*)]. 

Define / = and g = lB/;W(B)tobe the indicator functions of A and B normalized so that 

their expectations over a uniform input are E* [/(*)] = E z [g(x)] = 1. With this notation, Eq. ([Tj 
becomes 

(f,R(g))=B x \f(x)R(g)(x)] el±«- 1/3 (2) 

For a function / : {0,1}" — > R, define its Fourier transform / : {0,1}" — >■ R by f(w ) := 
^zK - !) "*/ (•*)]• Then by the orthogonality of the Fourier transform, Eq. (|2) can be written equiv- 
alently as 

IV 

An easy direct calculation reveals that R is diagonal in the Fourier basis. (Alternatively, one 
can use Schur's lemma and the fact that R commutes with translations.) This calculation also 
reveals that the eigenvalue corresponding to if G {0, 1}" is whenever the Hamming weight of w 
is odd, 1 when the Hamming weight of w is 0, 

(n/f) ~ 2 (w/2-l) + (n/2-2) ^ _1_ 

(n/2) ~ n 

when w is of Hamming weight 2, approximately ^ when w is of Hamming weight 4, etc. We can 
therefore write 

« /(o)$(o) -J E /(«0*(«0 + i E /(»0*(»0 - • • • • 

o> |w|=2 H=4 

The first term is /(0)g(0) = E x [/(x)]Ej[g(x)] = 1. Hence our goal is to bound the remaining 
terms by n~ 1/3 . For simplicity, let us focus on the first term, and show that YL\w\=2 f { w )si w ) * s a * 
most n 2//;3 in absolute value; one can similarly analyze the remaining terms and show that their 
total contribution is similar^] By using the Cauchy-Schwarz inequality, we can bound this sum by 
(Yl\w\=2f( w ) 2 ) 1/2 (12\w\=2S( w ) 2 ) 1/2 - The following lemma now completes the proof. 

Lemma 2.3. Let A C {0,1}" be of measure u, and let f = 1a/u(A) be its (normalized) indicator 
Junction. Then, for some universal constant C > 0, 

E fn 2 < c(io g (i/ F )) 2 . 

M=2 

1 This is where we are cheating: the term \zv\ = n can contribute a lot to this sum. 
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Equivalently the lemma says that if X = {x\,. ..,x n ) is uniformly chosen from A, then the 
sum over all pairs {i,j} of the bias squared of X{ © Xj is at most C(log(l/p)) 2 . This can be seen 
to be essentially tight by taking, e.g., A = {x G {0, l} n ;Xi = • • • = Xi g 2 i/^ = 0}- This lemma is 
proven by applying the Bonami-Gross-Beckner hypercontractive inequality I Bo n70l IGro75t IB~ec75l 
in a way similar to that in BKKL88I . Essentially the exact same lemma appears in HGKK+07I . and 
is also described in detail in the survey |Wol08J. We include a sketch of the proof, as later on we 
will have a similar proof in the spherical setting (in Lemma |53). 



Proof. The hypercontractive inequality for the Boolean cube states that for any / : {0, l} n — > R, 
and 1 < p < 2, 

W T ^=xf\\2<Wfh 

where T p is the noise operator with parameter p (which is the operator that is diagonal in the 
Fourier basis, and has eigenvalue p k for each Fourier basis function of level k), and the pth norm 
is defined as ||/|L = E x [|/(x)|P] 1/p . By plugging in our / we obtain 



E ft™) 2 ^ jp^w E(p - i) H /H 2 



w=2 



(P 

= 1 \\ T f || 2 

(p- 1)2 II VP- 17112 

< I ||f||2 _ I -2(l-l/p) 

^ (p_ 1 )2ll/Hp- (p_l)2^ 

The lemma follows by optimizing over 1 < p < 2. □ 

3 Preliminaries 

General. Throughout the paper, by "measurable" we mean Borel measurable. All logarithms are 
natural logarithms unless otherwise specified. We adopt the following convention for denoting 
constants. The letters c, c, C, C, etc. stand for various positive universal constants, whose value 
may change from one line to the next. We usually use upper-case C to denote universal constants 
that we think of as "sufficiently large", and lower-case c to denote universal constants that are 
"sufficiently small". 

Some manifolds and uniform distributions on them. Write S" 1 = {x G R"; \ x\ = 1} for the 
unit sphere in R". We denote by a the uniform probability measure on S n_1 , i.e., the unique 
rotationally-invariant probability measure on S" _1 (see, e.g., [MS86, Chapter I] for more infor- 
mation on Haar measures). We denote by Q n ,m the Grassmannian manifold, i.e., the manifold of 
all m-dimensional subspaces in R", and we let erg be the uniform distribution over it (or, more 
formally, the unique rotationally-invariant probability measure on Q n ,m)- We also consider the 
incidence manifold 

^n,m = ^ {%> H) € S X Qn,n—m'r % € C S X Qn,n—mr 
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and let a% be the uniform probability measure on it (or more precisely, the unique rotationally- 
invariant probability measure on Z„ /m ). We will implicitly use some basic properties of these man- 
ifolds and the uniform distributions on them; for a rigorous discussion of the topic, see, e.g., 
Helgason HHel99l Chapter II]. 

4 Communication Complexity 

In this section we give a formal definition of the VSP problem, and derive the main lower bound 
from the sampling statement. Our discussion in this section closely follows Raz's |Raz99j, hence 
we will occasionally allow ourselves to be brief. We also assume some basic familiarity with 
randomized communication complexity [KN97J. 

We start with the formal definition of VSP. This is identical to the Vq problem defined in [Raz99j. 

Definition 4.1. Let < d < 1/ \/2 be a parameter. In the VSP# problem, Alice is given an n- 
dimensional unit vector u G S" _1 and Bob is given a subspace H C R" of dimension n/2. They 
are promised that either the distance of u from H is at most or the distance of u from H is at 
most d. Their goal is to decide which is the case. 

This problem was first defined by Kremer [Kre95j and was shown to be a complete problem for 
one-round quantum communication complexity. In particular, for any < # < 1/ \/2, VSP# has 
an (almost immediate) quantum protocol communicating only O (log n) qubits in a single message 
from Alice to Bob. (Moreover, there is a matching Q(log n) lower bound.) 

In terms of its classical (randomized, bounded-error) communication complexity, Raz [Raz99] 
has shown that the problem has an 0(\Jn) communication protocol, which we now briefly de- 
scribe. Assume Alice and Bob use their shared randomness to pick a sequence of unit vectors 
chosen uniformly from S n ~ , V\, Vi, . . .. Alice looks for the vector V\ with the maximal inner prod- 
uct V{ • u among the first 2 C ^" unit vectors, and sends the index i to Bob, who decides on the 
output based on which of H and H is closer to v\. The protocol clearly requires only 0(\/n) bits 
of communication, and moreover, the output produced by Bob is correct with high probability 
(essentially since the projection squared of V\ on H (or H^) gets an addition of n~ 1,/2 due to the 
high inner product with u, which is sufficient to noticeably affect Bob's answer since the standard 
deviation of the projection squared is of order n~ 1//2 ). Using Newman's theorem, the shared ran- 
domness can be replaced with private randomness by only communicating an extra 0(log n) bits 
(which is negligible). For a more detailed proof, see Theorem 3.8 in [Raz99|. 

However, no lower bound better than logarithmic was previously known. Our main result 
is an Q(n 1//3 ) lower bound on the randomized communication complexity of the problem VSPo 
(which is the problem described in the introduction). One minor caveat here is that this lower 
bound holds only for protocols that are "measurable," in the sense that the functions describing 
the behavior of the players need to be measurable. Clearly, increasing t? can only make the problem 
harder, hence our lower bound also apply to any < d < 1 / \/2. Moreover, as we shall see below, 
there is no need to assume measurability in the case d > 0. 

Another point to note is that the number of possible inputs to VSP is infinite. Although there is 
nothing terribly wrong with this, in the standard communication complexity model problems are 
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supposed to have inputs that are taken from a finite set. This can be easily achieved by specifying 
the inputs using an n-dimensional vector (for Alice) together with an n x n/2 matrix (for Bob) 
each of whose entries is described by 0(log n) bits. We denote this problem by VSP. Since this is a 
restriction of VSP, we clearly still have a one-way O (log n ) qubit protocol. Next, notice that for any 
< t? < 1/ \[2, we can convert any protocol for VSP# into a protocol for VSPo by simply rounding 
the coordinates of the inputs. Moreover, the resulting VSPo protocol is clearly measurable since 
its input space is partitioned into a finite number of simple sets, and the protocol's behavior is 
completely determined on each of these simple sets. We therefore obtain a lower bound of Q(n 1,/3 ) 
on the randomized communication complexity of VSP# for any < # < 1/ \[2. Notice that the 
problem's input size is m = 0(n 2 log n), and hence in terms of the input size, our lower bound 
is Q((m/ log m) 1 ^). Finally, since VSP# is a restriction of VSP#, we also obtain a lower bound of 
Q(n 1//3 ) on the randomized communication complexity of VSP# for any < J? < 1/ \[2, without 
the measurability assumption. We summarize this discussion in the following theorem, which we 
then proceed to prove. 

Theorem 4.2. Any measurable randomized (bounded-error) protocol for VSPo requires Cl(n 1/3 ) bits of 
communication. As a result, we obtain that for all < d < I/a/2, the randomized communication 
complexity of both VSP# and VSP# is Q(n 1/3 ) (without any measurability assumption). 

Proof. As described above, it suffices to prove the lower bound on VSPo- Fix an arbitrary random- 
ized protocol communicating at most D bits, and assume that it solves VSPo with error probability 
at most 1 /3 on all legal inputs. (The argument applies to any error probability smaller than 1/2 
by a standard amplification technique.) Our goal is to lower bound D. 

Recall the definition of X„ n / 2 and the uniform distribution o~% on it, given by a uniformly cho- 
sen subspace H of dimension n/2 and a uniformly chosen unit vector u in H. We also define the 
set I n /tt /2 as the set of all pairs (x, H) S S" _1 x Q„ in /2 such that x G H\ and let be the uniform 
distribution on it, given by a uniformly chosen subspace H of dimension n/2 and a uniformly 
chosen unit vector u in H . 

We consider the following two quantities. The first is the probability that the protocol incor- 
rectly outputs "u not in H" when the inputs are chosen from Cj. The second is the probability that 
the protocol incorrectly outputs "u in H" when the inputs are chosen from oj. By our assumption, 
each of these quantities is at most 1/3, and hence their sum is at most 2/3. By linearity there exists 
a way to fix the random string used by the protocol such that the resulting deterministic protocol 
also satisfies that the sum of these two quantities is at most 2/3. From now on we consider that 
deterministic protocol. 

As is well known, such a deterministic protocol induces a partition of S" _1 x Q„ /n /2 into 2 D 
rectangles, i.e., measurable sets of the form A x B where A C S' ,_1 and B C Q„ /n /2, where each 
rectangle is labelled with "in" or "not in", corresponding to the protocol's output on inputs from 
this rectangle. In order to analyze this partition, we use the following lemma, which follows easily 
from our main sampling theorem, as will be shown in Section [6] 

Lemma 4.3. Suppose that A C S n_1 and B C G n ,n/2 we measurable sets with 

a{A) > Cexp(-cn 1/3 ), a g (B) > Cexp(-cn 1/3 ) 
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for some universal constants c, C > 0. Then, 

a x {{A x B) nl„,„ /2 ) > 0.8 <r(A)<r g (B). 
As a result, we obtain that for all measurable sets A C S n ^ and B C Q n/n /2/ 
a x ((A x B) fl In,n/z) > 0.8 a(A)a g (B) - C exp(-cn 1/3 ). 
By simply replacing H with H we also obtain that 

a x ((A x B)nl„ ; „ /2 ) > 0.8 a(A)a g (B) - Cexp(-cn 1/3 ). 



(3) 



(4) 



We now sum the inequalities (|3j over all rectangles Ax B that are labelled with "not in" and the 
inequalities Q over all rectangles labelled with "in". Our assumption above says precisely that the 
left hand side is at most 2/3. The right hand side is exactly 0.8 — 2 D • C exp(— cn 1 ^ 3 ). Rearranging, 
we obtain that 2 D > cexp(cn 1/3 ), as required. □ 

5 Sampling Sets by Equators 

In this section we prove one of the main components of our proof, namely, a sampling theorem 
using equators: we show that any (not too small) subset A of the sphere S n_1 is sampled well 
by a randomly chosen equator (where an equator is the intersection of S" _1 with an (n — 1)- 
dimensional subspace). See Figure [Tj 

Theorem 5.1. Let A C S' I_1 be a measurable set. Assume H is a uniformly chosen (n — 1) -dimensional 
subspace. Then, for any < t < 1, the probability that 

°~h{A fl H) 



a(A) 



> t 



is at most Cexp(— cut/ log(2/<r(A))) for some universal constants C,c > 0, where ch denotes the 
uniform probability measure on the sphere H n S"^ 1 . 

In the rest of this section, we actually prove the following more symmetric statement, from 



which Theorem 5.1 follows as described below. Here we denote by V n the manifold of all pairs of 
orthogonal vectors, 

V n = { (x,y) £ S"- 1 x S"- 1 ; x ■ y = o} 
and we let ay denote the uniform probability measure over V n . 



Theorem 5.2. Suppose f,g : S n 1 
J S n-i gda = 1 and set 

Then, when s < cn, 



[0, oo ) are bounded measurable functions with f s „_j fda 

S = log(2||/||oo)-log(2||^||eo). 

Cs 



Vn 

where C, c > are universal constants. 



f(x)g(y)da v (x,y) 



< 



n 
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In order to derive Theorem 5.1 let E be the set of all y G S n 1 for which the subspace y C R n 
orthogonal to y satisfies 

a±(Any ± ) 

y , . , > 1 + t. 
a(A) 

Let / = 1a /c( A) and g = 1e/ct(E) be the normalized indicator functions of A and B, respectively. 
Then it follows that 

/ f{x)g{y)doy{x,y)>l + t 

since the left hand side is exactly the average of cr x (A n y )/ c{A) over y chosen uniformly from 
E. Hence by Theorem |5.2[ 

t < Clog(2/^(A))log(2/(7(E)) _ 
— n 

Rearranging, we obtain that 

cr(E) < Cexp(-cnf/log(2/(7-(A))). 



Repeating a similar argument for the lower bound, we obtain Theorem 5.1 



Our proof of Theorem 5.2 resembles a small jigsaw puzzle, in which all of the pieces are known 
mathematical constructions that have to be put in place in order to yield a proof. Therefore most 
of this section is devoted to a brief summary of standard mathematical material, such as some 
basic features of spherical harmonics, the Laplacian, log-Sobolev inequalities, hypercontractivity 
growth of LP norms of eigenfunctions, and the Radon transform and its eigenvalues. 



Spherical harmonics. We write L 2 (S n 1 ) for the space of all square-integrable functions on S n . 
For U e SO(n) and / G L 2 (S n - 1 ) denote 

U(f)(x)=f(U- 1 x) (I6S- 1 ). 

We say that U(f) is the rotation of / by U. For any integer k > 0, there is a special finite- 
dimensional subspace C L 2 (S' I_1 ) of smooth functions called the space of "spherical harmonics 
of degree k." For instance, So is the one-dimensional space of constant functions. More generally, 
•Sjt is defined as the restriction to the sphere of all harmonic, homogenous polynomials of degree 
k in R". See, e.g., Miiller [Mul66| or Stein and Weiss [SW71 1 for a quick introduction and for more 
information on spherical harmonics. The space is invariant under rotations and hence provides 
a representation of SO(n). This representation is known to be irreducible, that is, for any subspace 
E C Sk that is invariant under rotations, we necessarily have 

E = {0} or E = S k . 

Moreover, these representations in <Sjt for k = 0, 1, . . . are known to be inequivalent; this follows, 
e.g., from the fact that their dimensions (given by ("^J^) — Ct-i 3 )) are a ^ different (assuming 
n > 3). Elements of are orthogonal to elements of for k £. We denote by Projs k the 
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orthogonal projection operator onto S k in L 2 (S" 1 ). Then any function / G L 2 (S' ! 1 ) may be 
decomposed as 

CO 

/ = £ ProjsJ 

where the sum converges in L 2 (S n_1 ). This decomposition of a function on S n_1 is analogous to 
the decomposition of a function on the Boolean hypercube into Fourier levels. 

Laplacian. Write C 00 ^*- 1 ) for the space of infinitely differentiable functions on S" . For a 
function / G C 0O (S f! " 1 ) and x G S"" 1 , we define 



( Af) (x) = £ —/ ( (cos 0* + (sin f)e, 

! = 1 " r 



(5) 

f=0 



where ei, . . . , e n _i is an orthonormal basis of x . Notice that for any orthogonal x,y G S" -1 , the 
curve f h4 (cosf)x + (smt)y draws a great circle on S n_1 , that visits x at f = 0, and its tangent 
vector at f = is the vector y. The right hand side of <|5} does not depend on the choice of 
the orthonormal basis e\, . . . ,e n -\. The operator A, acting from C O0 (S' I_1 ) to itself, is called the 
spherical Laplacian. 

One computes (see, e.g., |SW71)) that for any k > and cp k G S k , 

Acp k = -A k (p k (6) 

where 

\ k = k{k + n-2). 

The Laplacian thus has a complete system of orthonormal eigenfunctions in L 2 (S" _1 ) (even though 
the Laplacian is defined only for smooth functions and not in the entire space L 2 (S" -1 )). 

Noise operator. The noise operators on S"^ 1 are 

U p =p- A {0<p< 1). 

A priori, these operators are defined, say, on the dense space of finite linear combinations of spher- 
ical harmonics. Since the norm of U p does not exceed one, we may uniquely extend U p to a self- 
adjoint operator U p : L 2 (S" _1 ) — > L 2 (S" _1 ) of norm one. From ^ it follows that for any k > and 

(pk e s k , 

U p cp k = p h (p k . 

Hypercontractivity. We proceed with a short review of hypercontractivity, a subject going back 
to Nelson [ Nel66 ]. For p > 1 and for a measurable function / : S" _1 -4 R we write ||/|| p = 



( Js«-i |/| p da) 1/p for the IAnorm of /. The hypercontractive inequality states that for any 1 < p 
q, and any function / G U{S n ^ 1 ), 

/ n l/(2«-2) 

||ir„/||, < H/llp for < p < ( ^ — - ) . (7) 



< 
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We now briefly describe how one proves such an inequality. By differentiating with respect 
to p and q, Gross BGro75l showed that hypercontractive inequalities such as the one above are 
directly equivalent to so-called log-Sobolev inequalities. Indeed, a common technique for proving 
hypercontractive inequalities is by proving the analogous log-Sobolev inequality (as the latter is 
often cleaner and easier to work with). More specifically for our hypercontractive inequality 
the equivalent log-Sobolev inequality turns out to be 

for any smooth / : S" _1 — > 1R where V/ denotes the gradient of /. Finally this (tight) inequality 
was proven by Rothaus [Rot86[. 

We note that a slightly weaker inequality in which is replaced by (leading to a cor- 
responding worsening of the exponent in ^ from 1/ (2n — 2) to 1/ (2n — 4)), follows from the 
elegant Bakry-Emery criterion (see [BE85|, or e.g., |BL06|). This criterion states that a log-Sobolev 
inequality holds for any connected manifold whose Ricci curvature is uniformly bounded from 
below by some positive constant. In our very special case, the manifold is S" _1 , whose Ricci 
curvature is constantly n — 2, leading to ^ with the slightly weaker constant ■ This slightly 
weaker version certainly suffices for all of our needs in this paper. 

Kahn, Kalai, and Linial |KKL88| realized that hypercontractive inequalities such as ^ imply 
certain bounds on the growth of V norms of the Laplacian eigenfunctions. Although they focused 
on the Boolean hypercube, their idea can be applied in much greater generality, and in particular 
to the sphere. Indeed, suppose q>^ G for some k > 0. Then UpCpi = p kk (p\. From 0, for any 

1 < p < q, 

_ 1X X k /{2n-2) 



For large n and fixed k, we have A^/ {In — 2) « k/2. In this case, the bound Q roughly says 
that for any t, the set of points x G S n_1 where \(pi\ > f||<Pfcl|l has measure at most C exp(— c£ 2 ^). 
The following lemma runs in a similar vein, and provides an upper bound on the mass that the 
indicator function of a set can have on each level of the spherical harmonics decomposition. 

Lemma 5.3. Suppose / : S" _1 — > 1R satisfies ||/||i = land \\f\\oo < M. Then, for any k > 1, 

logM W A */(2«-2) 



\Proj Sk f\\ < e-max 1, 



12 ~V \'A k /{2n-2 
Proof. First, note that for any p > 1, 

i/p / f \i/p 



J s i \f\ p do^ /P < (mT 1 j s i \f\do^ /P = M { P- 1)f P < MT 



In particular, since HPro/sj./^ < ||/||2 < M, we obtain that the lemma holds whenever A/ c > 
(2n — 2) log M. So assume from now on that A& < (2n — 2) log M. We use §7§ for q = 2 and obtain 
that for any 1 < p < 2, 

\\U p fh<\\f\\ v <M^ forp = {p-l) 1/ ^- 2 K 
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Projecting to 5/ c , we see that for any 1 < p < 2, 

(p-l)£* \\Proj s j\\ 2 = \\Proj Sk (U p f)\\ 2 < \\U p f\\ 2 < M p_1 . 
We complete the proof by choosing p = 1 + (2 n -2)*iogM — ^ - ^ 

Radon transform. Recall that for 9 G S' I_1 we write cr ± for the uniform probability measure 
on the sphere S" _1 n . Then the spherical Radon transform R(f) of an integrable function / : 
S n ~ l ->■ R is defined as 

R(/)(0) = / /(x)d(7^(x), (0GS"- 1 ). 

So R(/) is simply the average of / on the equator of vectors orthogonal to 8. Observe that for 
functions f,g € L 2 (S n ~ l ), we have 

/ f(x)g(y)dcr v (x,y) = f f(x)Rg(x)da(x). (10) 

Jv„ Js"- 1 

This equality describes the intuitive fact that integrating uniformly over all orthogonal pairs (x,y) 
is the same as integrating uniformly over x, and then uniformly over all y in the orthogonal com- 
plement of x. See, e.g., Helgason [Hel99. Chapter II] for a more formal derivation. 

Define a sequence of numbers {p-k)k=o,i,... as follows. Suppose X = (Xi, . . . , X B _i) is a random 
vector that is uniformly distributed in S"~ 2 . For an even k > denote 

^=(-l) fc/2 E[X*], 

and for odd k set y^ = 0. We now show that S^ are the eigenspaces of R with p% being the 
corresponding eigenvalues. 

Lemma 5.4. For any k > and cpk e Sk, 

R {<Pk) = H<Pk- 

Proof. The Radon transform clearly commutes with rotations. Therefore, because the S^'s give rise 
to inequivalent irreducible representations, Schur's lemma implies that R must have the S^'s as 
its eigenspaces. We briefly recall the proof of this standard representation-theoretic fact. Consider 
the restriction R k j of Projs R to an operator from Sk to Sj for some k,j > 0. Our goal is to show 
that Rfc i is zero whenever k ^ j and a multiple of the identity otherwise. Since R^ ; commutes with 
the action of SO(n), and Sk is irreducible, we have that kerR^y is either all of Sk or {0}. In the 
former case R^v = and we are done, so assume the latter case. By the same argument the image 
of Rfc i is either all of Sj or {0}, and since we assumed Ry ^ 0, it must be the former. Hence R^ 
is an isomorphism between the representation on 5; c and on Sj, which is impossible when k ^ j 
since we know that S^ and Sj are inequivalent representations. So assume k = j, and let A S ]R be 
an arbitrary eigenvalue of R^ (there exists such an eigenvalue since R^k is a symmetric operator). 
Then the kernel of AI — R^ must also be either all of Sk or {0}; the latter is impossible since A is 
an eigenvalue, hence we necessarily have R/yt = XI. 
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Our next goal is to show that the }i k 's are the corresponding eigenvalues. Fix some arbitrary 
e G S*- 1 . For ft > 0, we define the function f k : S"" 1 -> R by 

f k (x) = G k (x-e) (iGS"- 1 ) 

where G; c : [—1, 1] — > IR is the Gegenbauer polynomial (see, e.g., Miiller [Mul66|), 

G k (t) = E (t + i'XiVl - f2 )*- 

Here, z 2 = — 1 and X = (Xi, . . . , X n _i) is a random vector that is distributed uniformly over the 
sphere S' ! ~ 2 . The function f k is known to be a spherical harmonic of degree ft, i.e., in S k |Mul66|, 
and by our above discussion, must be an eigenfunction of R, i.e., Rf is proportional to /. From the 
definition of the Radon transform, 

(Rf)(e) = Gjt(0) and f(e) = G k (l) = 1. 

We conclude that G k (0) is the eigenvalue corresponding to S k . It remains to notice that G k (0) 
vanishes for odd ft and equals ( — l^^EX^ for even ft, and hence equals ji k for all ft. □ 

The next technical lemma gives upper bounds on the eigenvalues ]i k . 

Lemma 5.5. Suppose n > 10. Then, the sequence \ jio\, \l*2\r |^4|/ • • • is non-increasing, and moreover, for 
all ft > 1, 

\M<(c-) • 

Proof. The first claim follows immediately from the fact that |Xi| < 1 and \f^2k\ = E [| | 2fc ] . For 
the second claim, notice that the density of Xi is proportional to (1 — x 2 )^ -4 ^ 2 for x E [—1,1], 
and vanishes outside this interval. Hence, our goal is to prove that for all even ft > 2, 

f^x k (l-x 2 )^ /2 dx< (c^)" /2 f{l-x 2 )^ /2 dx. 

The integral on the right hand side is at least c/ \Jn (this is true even for the integral from — 1/i/n 
to 1/ \/n). The integral on the left hand side may be estimated as follows: 

/ x k (1 - r 2 )^ dx < / A-^x 2 ^ < [ x k e~^ x2 dx. 

J-l J-l J-oo 

The latter integral is exactly the ftth moment of a normal variable with mean and variance 1 / (n — 
4), times the missing normalization factor of ^/2n/ (n — 4). A standard fact is that for even ft this 
moment is 

(n-4)- k/2 - (ft-1)!! < 

where (ft - 1)!! = (ft - l)(ft - 3) • • • 1. The lemma follows. □ 
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Proof of Theorem 5.2 It suffices to prove the theorem under the assumption that n > 10 (otherwise 



there is no s < cn, for a sufficiently small universal constant c > 0). By Lemma 5.4 and pO) , 

/ /(x)g(y)dM* /y ) = / fR{g)da=f jH j Proj Sk (f) Proj Sk (g) da. 

Jv„ Js"- 1 Js"-i 

Note that jiq = 1 and Projs (/) = Projs (g) = 1. Therefore, by the Cauchy-Schwarz inequality 

,, oo 

/ /(%(y)d^v(x,y) - 1 < EballlPnj^HzllPro^gllz. 

We will prove the theorem by showing that the latter sum is at most Ca/V n, where a = 
log (2 ll/H oo) and /3 = log(2||^||oo). Observe that oc,f> > 1/2 and recall our assumption that a/3 
is at most cn. We start by analyzing the part of the sum in which k runs from 1 to T — 1 where 



T = \_5n\ for some sufficiently small constant S > 0. Using Lemmas 5.3 and 5.5 we have the 
bounds 



I J*2Jfc I < 



Ck 



\\Proj s J\\ 2 < Cmax 1, 



A 2it /(2n-2) 



and similarly for g with f>. Therefore, 



T-l 

E 

Jc=l 



>- [ 'Ck\' 

k-1 v " / 



E l^lll^^/lbll-Pw^lla < E ( — ) (Cmax ( 1, = ) ) " ( Cmax ( 1, 



axx A 2Jc /(2n-2) 



A2t/(2fi-2) 



The term = 1 is at most 



Ca/3 



We will now show that the terms in the latter sum decay geometrically, and hence we can also 
bound the sum by Ca/S/ n. To this end, first notice that 



Second, 



C max ( 1, 



k + 1 



CQc + l) 



fc+i 



Ck\ k _ C{k + 1) /k + l\ k Ck 



n J 



n 



n 



A a+2 /(2n-2) ^NA a /(2«-2) , , a N N (A a+2 -A a )/(2«-2) 

^ (^Cmax^l, -J J < ^Cmax|^l, 



C max ( 1, 



< C max ( 1, 



Hence the ratio between the term for k + 1 and that for k is at most 

C- max (1,-) max 1, - < -, 
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as k < Sn, once we choose 5 to be a sufficiently small positive universal constant. This implies that 
we can upper bound the sum from 1 to T — 1 by Ca/S/ n, as required. 

It remains to analyze the less significant part of the sum, in which k runs from T = \_Sn\ to 
infinity. Then, by Lemma 5.5 and another application of Cauchy-Schwarz, 

00 00 

E M II ProjsJh II Projsngh < \m\ E \\ Pro is 2k fh\\Prois 2k g\\2 

k=T k=T 

< MWfhWgh 

< exp(a + j6 — cn) 
n n 

under the legitimate assumption that a/3 < cn. We conclude that the entire sum is bounded by 

Cixp/n. □ 



6 Sampling Sets by Lower Dimensional Subspaces 



Our next step is to iterate Theorem 5.2 using a certain martingale process, in order to obtain a 
corresponding theorem for the Grassmannian. The constants 0.1 and 9/10 appearing below do 
not play any special role and can be replaced with any other constants (as long as the former is 
positive and the latter is smaller than 1). 

Theorem 6.1. Let 1 < m < 9n/10. Suppose that A C S" _1 is a measurable set with <j{A) > 
Cexp(— cn 1/3 ). Assume that H is a uniformly chosen (n — m)-dimensional subspace. Then, 



o-h{A n H) 



cr(A) 



< 0.1 



except with probability at most C exp(— cn 1/3 ). Here, c, C > are universal constants. 

We start with a few technical lemmas. The first one below bounds the moments of a random 
variable that has an exponentially decaying tail around 1. We will apply it with random variables 
whose expectation is very close to 1. 

Lemma 6.2. Let R, 3 > and let Zbea non-negative random variable satisfying that for any t > 0, 

P(|Z - 1| > f) < Rexp(-t/5). 

Then, for any 2 <£< {2d)- 1 , 

E[Z e ] < 1+£E[Z-1] +2R{£6) 2 . 
Proof. Using the Taylor expansion, we have that for any x > — 1, 



(1 



1 + £x- 



< i + ix + e 



E 1 ( £ 



x k + 



£ 



{1 + ty-m x w 



k=2 



k\ 



[£\\ 



17 



where £ is some real number between x and 0. Next, for any k > 1, 



E[|Z-1|*1 = / 
Jo 



Pf|Z-l|* > t]dt 



f-OQ 

= / kt k - l W\\Z-l \ > t]dt 
Jo 

poo 

<Rk t k - 1 exp(-t/S)dt = R-k\-S k 
Jo 



(11) 



Combining the two inequalities, we obtain 



Kl 



E[Z e ] < 1 + £E[Z-1) + Rj^(£5) k + R(£5)W([£\ + 1)5 

k=2 

< 1 + £E[Z - 1] + 2R(£6) 2 . 



□ 



Our second lemma bounds the upper tail of a certain martingale-like product and is based on 
a Bernstein-type inequality. We then derive as a corollary a similar bound on the lower tail. 

Lemma 6.3. Let R, 5 > and let Z\, . . . ,Z^ be non-negative random variables where k < 1/ (320R5 2 ). 
Assume that for all 1 < i < k, when conditioning on any values ofZ\, . . . , Z;_i, we almost surely have 

1 



E[Z f | Zi,...^^] < 1 



20fc' 



F[\Zi -1\ > 1 1 Zi, . . . ,Z,-_i] < Kexp(-f/<5) ybroZZt > 0. 



(12) 
(13) 



Then, 



P 



Y\z t > i.i 



< 



exp(-l/(8(W) + Rfc/2), < 1/(80R6) 
exp(-l/(12800R^ 2 )), otherwise. 



Proof. Let 2 < £ < (23) 1 be a real number to be determined later on. Then, by Lemma 
E 



6.2 







!=1 


= E Zl Zh 



\lZ i YB[Zi\Z 1/ ... / Z k _ 1 } 

i=i 



< 1 



20k 



2R(£5) 2 ^J E Zl Zh 



- " ' - l 1 + 20* +2R( - £3 r ) < exp ( ^ + 2Rk(£5) 
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Therefore, 



P 



fl Z < > 11 



< LI"' exp (Jq + 2Rk(£6) 2 ^j < exp^--+2Rk(£5)' 



The minimum of the right hand side over £ is exp( — 1/ ( 12800 RkS 2 )) and is obtained for £ = 
1/ (160Rk5 2 ). We set £ to this value, unless it is greater than 1/ (25), in which case we set £ = 
1/ (25). The lemma follows. □ 
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Corollary 6.4. Let R,S > and let Z\,. . . ,Z k be random variables taking values in (1/2, oo) where 
k < 1/ (1280RS 2 ). Assume that for all 1 < i < k, conditioning on any values ofZi, . . ., Zj-%, we almost 
surely have 



E[Z f | Z l ,...,Z l _ 1 \ > 1 



40fc' 



F[\Zj -1| > 1 1 Zi,...,Z;_i] < Rexp(-t/S) for all t > 0. 



Then, 



P 



! = 1 



< 



exp(-l/ (160(5) +Rfc/2), fc<l/(160R£) 
exp(-l/(51200R/aS 2 )), otherwise. 



Proof. We simply apply Lemma 6.3 to the random variables Z^ , . . . , Z^ 1 with R and 25. Eq. ( [T3] > 
holds because for all f > and x > 1/2 if \x~ x — 1| > t then also \x — 1| > f/2. For Eq. ( fl2"] |, we 
use the inequality < 1 — (x — 1) + 2(x — l) 2 , valid for all x > 1/2. This implies that 



EfZr 1 | Zi,...^,--!] < 1 
< 1 



40/< 
1 

iol 



■2E[(Z;-1) 2 I Zi,...^,^] 



4R<5 2 < 1 + 



20/V 



where the next-to-last inequality follows from the calculation in pi) . □ 

Proof of Theorem 6.1 Fix 1 < m < 9n/10 and a set A C S n_1 . Consider a sequence of random 
subspaces in IR", 

R» = H D Hi D H 2 D • • • D H m 

in which dim(H ; ) = n — i, defined as follows. For each z > 1, the subspace H, is chosen uniformly 
in the Grassmannian of all (ft — z')-dimensional subspaces of Hj_i. An important observation, 
which follows from the uniqueness of the Haar measure, is that the subspace H, is distributed 
uniformly over Q n ,n-v and in particular, H m is a uniform (ft — m) -dimensional subspace. 
For k = 1, . . . ,m define the random variable 

a Hk (AnH k ) 



where o~H k is the uniform measure on the sphere S n D H^. If the denominator vanishes, we set 
the random variable to 1. Notice that 



<r Hm (AnH„ 

o-(A) 



and hence our goal is to show that this product is in 1 ± 0.1 except with probability at most 
C exp(— cft 1,/3 ). We will do this by applying Lemma 6.3 and Corollary 6.4 to a regularized ver- 
sion of Xi, . . . , X m defined below. 

We note three properties of the random variables X k . First, we have that for any 1 < k < m, 
conditioned on any values of Hi, . . . , Ht_i, 



E (Xjt|Hi, . . .,H k _i) = 1. 
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This holds since H k is distributed uniformly over the Grassmannian of subspaces of H k _i. Second, 



by definition, X k is bounded from above by 1 / (c r H t _ 1 (A fl H k -\ ) ) ■ Finally, by Theorem 5.1 for any 
< t < 1, 

PflXjt - 1| > t | H lr . . .,H k -i) < Cexp(-c(n -k + l)t/ log (2/(7^ (A n H t _ x ))) 

< Cexp{-cnt/ log(2/t7H i _ 1 (A n H*_i))), 

where we used the fact that k < m < 9n/10. Because this tail bound holds only for t < 1, 
we cannot apply Lemma 6.3 and Corollary |6.4| directly, and instead proceed below to define the 



regularized random variables Z\, . . ., Z m . 

Next, for < k < m, we define the "bad" event B k as the event that X X X 2 • • • X k < 1/2 and for 
1 < k < m, the "bad" event Q c as the event that \X k — 1| > 1/2. Condition on any Hi, . . . , H^_i 
such that B;t_i does not occur. In this case, c7n lt _ 1 (A n H k _i) > a(A)/2. Hence, X k is upper 
bounded by 2/a(A) < C exp(cn 1/3 ). Moreover, for any < t < 1 the probability that |Xfc — 1| > f 
is at most Cexp(— cnt/ log(4/cr(A))) < C exp(— cn 2/3 t), and in particular the probability that Q 
occurs is at most Cexp(— cn 2 ^ 3 ). For 1 < k < m, we define the random variable Zj. as follows: if 
either B k _i or Q occurs, is 1. Otherwise, Z^ = X^. 

We can now finally apply Lemma 6.3 and Corollary |6.4| for each 1 < k < m, we apply them 



to the sequence Z\, . . . ,Zi with R = C and 8 = Cn 2/13 . To see why the conditions there hold, 
condition on any Hi, ... , H k _i, and assume first that Bj._i does not occur. Then 

|E[2*|Hi, . . ./Hfc-x] - 1| = |E[Z fc - X k \Hi,. . .,H k _i]\ 

<V[C k \H lr ...,H k _ l }-Cex V {cn 1/3 ) 
< Cexp(-cn 2/3 ). 

Moreover, for all non-negative t, the probability that \Z k — 1\ > t is at most Cexp(— cn 2 ^ 3 t). Fi- 
nally, these two statements are obviously true even when B k _i does occur (since in this case Z k 
is simply 1), hence we obtain that the two statements hold conditioned on any Hi, ... , H k _i (and 
in particular, on any Z\,..., Z k _i). As a result, the lemma and the corollary imply that for each 
1 < k < m, \Z\ ■ ■ ■ Z k — 1| > 0.1 with probability at most Cexp(— cn 1 ^ 3 ). Moreover, by a union 
bound, the probability that there exists a k for which \Z\ ■ ■ ■ Z k — 1\ > 0.1, an event which we 
denote by D, is at most 

P[D] < m ■ Cexp(-cn 1/3 ) < Cexp(-cn 1/3 ). (14) 

Next, we claim that for any 1 < k < m, 

P[-iDA->Ci A--- A-iC fc _i AC fc ] < Cexp(-cn 2/3 ). (15) 

To see why, notice that — 1C1 implies that Xi = Zi, which together with ->D implies that — iBi; the 
latter, in turn, implies that X 2 = Z 2 (since neither B\ nor C 2 happens), which implies that B 2 does 
not happen either; etc. As a result, we obtain that ->B k _i, which implies that the probability of C k 
is at most C exp(— cn 2/3 ), as desired. 
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By summing all the probabilities in ( 14 1 and |[15), we obtain that 

P[^D A — 1C1 A • • • A -iCm] > 1 - Cexp(-cn 1/3 ). 

It remains to notice using the same argument as above that this event implies that for all k, = X/ c 
and therefore also that |Xi • • • X m - II < 0.1. □ 



The only thing remaining is to derive Lemma 4.3 from Theorem 6.1 We restate it here in a 
slightly more general form. 

Lemma 6.5. Let 1 < m < 9n/10. Suppose that A C S" _1 and B C Q n/n - m are measurable sets with 

a{A) > Cexp(-cn 1/3 ), <r g (B) > Cexp(-cn 1/3 ) 

for some universal constants c, C > 0. Then, 

o- x {{A x B) nl„, ra ) > 0.8 a(A)a g (B). 

Proof. Notice that o"j((A x B) n l„ im )/crg(B) may be interpreted as the probability that when 
choosing a subspace H uniformly from B and a uniform vector x in H H S" -1 , we have x G A 
To analyze this probability denote by E C Q„ /n - m the set of all (n — m) -dimensional subspaces H 
for which 



a(A) 



< 0.9. 



Then, by Theorem 6.1 crg{E) < C exp(— cn 1 ^). Next, observe that the probability that H G E is at 
most o~g (E) / o~g (B) . Moreover, if H E, then by definition, the probability that x G A is at least 
0.9 a(A). Hence, 

assuming the universal constants are chosen properly. □ 
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